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Abstract 

Reliable  object  recognition  is  an  essential  part  of  most  visual  sys¬ 
tems.  Model  based  approaches  to  object  recognition  use  a  database  (a 
library)  of  modeled  objects;  for  a  given  set  of  sensed  data  the  problem 
of  model  based  recognition  is  to  identify  and  locate  the  objects  from 
the  library  that  are  present  in  the  data.  We  show  that  the  complexity 
of  model  based  recognition  depends  very  heavily  on  the  number  of 
object  models  in  the  library  even  if  each  object  is  modeled  by  a  small 
number  of  discrete  features.  Specifically,  deciding  whether  a  discrete 
set  of  sensed  data  can  be  interpreted  as  transformed  object  models 
from  a  given  library  is  NP-complete  if  the  transformation  is  any  com¬ 
bination  of  translation,  rotation,  scaling,  and  perspective  projection. 
This  suggests  that  efficient  algorithms  for  model  based  recognition 
must  use  additional  structure  in  order  to  avoid  the  inherent  computa¬ 
tional  difficulties. 


’This  work  was  supported  by  the  U.S.  Army  Research  Office  under  Contract  DAAL03- 
86-K-0171  and  by  the  Office  of  Naval  Research  under  Air  Force  Contract  FJ  9623-90-C- 
0002. 
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1  Introduction 


Many  tasks  of  perceptual  information  processing  that  are  easy  and  natural 
for  humans  appear  to  be  much  harder  for  machines.  For  example,  although 
locating  an  object  such  as  a  pen  on  a  table  appears  to  us  an  easy  task, 
it  requires  the  ability  to  identify  all  possible  shapes  of  pens  as  such,  and  is 
difficult  to  implement  in  a  machine.  These  difficulties  can  be  avoided  in  many 
computer  vision  applications  that  take  place  in  a  controlled  environment.  In 
these  cases  it  is  assumed  that  the  objects  of  interest  can  be  modeled  and 
catalogued  in  a  library.  The  problem  of  model  based  recognition  can  be 
informally  described  in  the  following  way:  given  a  library  of  modeled  objects 
and  a  set  of  sensed  data,  identify  and  locate  the  objects  from  the  library  that 
are  present  in  the  data. 

Reviews  of  the  extensive  literature  on  model  based  recognition  in  com¬ 
puter  vision  can  be  found  in  [1,  2,  3];  more  recent  studies  include  [5,  6,  9,  11]. 
The  standard  computational  approach  is  to  represent  the  modeled  objects 
and  the  data  in  terms  of  discrete  features  so  that  the  recognition  can  be 
solved  as  a  search  problem.  These  results  indicate  that  by  applying  rigidity 
constraints  in  various  ways,  model  based  recognition  can  be  efficiently  ap¬ 
plied  to  recognize  a  small  number  of  object  even  from  partial  views  and  in 
the  presence  of  non-malicious  noise.  The  relevant  complexity  parameter  in 
such  cases  is  the  number  of  features  that  model  each  object. 

In  this  paper  we  analyze  the  case  in  which  objects  are  represented  by  a 
small  number  of  features.  The  relevant  complexity  parameter  in  this  case 
is  the  number  of  objects.  Instead  of  analyzing  the  performance  of  specific 
algorithms,  our  approach  is  to  apply  techniques  from  complexity  theory  to 
identify  cases  in  which  model  based  recognition  appears  to  be  inherently 
difficult.  Specifically,  we  show  that  the  problem  is  NP-complete,  and  thus, 
its  complexity  (modulo  standard  complexity  assumptions,  i.e.,  P  ^  NP)  is 
exponential  in  the  size  of  the  library. 

Proving  that  a  problem  is  NP-complete  is  a  common  technique  in  com¬ 
plexity  analysis  for  identifying  the  problem  as  intrinsically  difficult.  In  a  (well 
defined)  sense,  an  NP-complete  problem  is  the  most  difficult  problem  in  the 
class  NP,  which  includes  many  difficult  problems  such  as  the  traveling  sales¬ 
man.  However,  an  NP-complete  problem  is  not  completely  unapproachable; 
a  standard  method  for  coping  with  such  problems  is  to  identify  easily  solved 
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sub-problems.  In  the  case  of  model  based  recognition  this  might  correspond 
to  exploiting  additional  structure  of  the  modeled  objects  and  the  way  they 
are  viewed.  For  more  information  on  the  theory  of  NP-complete  problems 
see  [4].  For  applications  of  NP-completeness  results  to  vision  tasks  see  [8,  10]. 

The  negative  results  of  this  paper  can  be  used  to  determine  constraints 
that  may  simplify  the  problem  of  model  based  recognition.  We  will  attempt 
to  identify  three  types  of  constraints:  constraints  that  leave  the  problem  NP- 
complete,  constraints  that  guarantee  efficient  (polynomial)  algorithms,  and 
constraints  that  make  our  NP-completeness  proofs  inapplicable,  so  that  they 
may  simplify  the  problem.  The  generic  model  based  recognition  problem  that 
we  consider  is  noise  free  and  assumes  no  occlusion.  An  example  of  constraints 
of  the  first  type  is  that  every  pair  of  local  features  can  be  found  in  at  most 
three  objects  from  the  library.  An  example  of  constraints  of  the  second  type 
is  that  every  pair  of  local  features  can  be  found  in  at  most  two  objects  from 
the  library.  An  example  of  constraints  of  the  third  type  is  occlusion  of  convex 
objects. 


2  Preliminary  definitions 

We  consider  situations  in  which  objects  can  be  described  in  terms  of  sets  of 
local  features.  A  local  feature  is  a  simple1  geometric  shape,  and  an  object  is 
described  by  a  set  of  local  features  and  their  location  in  space.  Commonly 
used  features  are  points,  lines,  angles,  etc.  An  example  is  shown  in  Figure  1, 
where  a  triangle  is  described  in  terms  of  straight  lines  (a),  corners  (b),  and 
points  along  its  edges  (c). 

Definition:  An  object  description  by  local  features  is  a  set  of  pairs 


O  =  X2),. 


where  for  1  <  i  <  t,  fi  is  a  local  feature  and  Ar,-  is  its  location  is  space  relative 
to  a  fixed  coordinate  system. 

Definition:  A  library  is  a  set  of  object  descriptions. 

xTlie  results  of  this  paper  hold  for  arbitrary  interpretations  of  “simple"  and  “local". 
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Figure  1:  Examples  of  local  features. 

Definition:  A  picture  is  sensed  data  given  as  a  set  of  local  features  and 
their  location  is  space. 

The  problem  of  model  based  object  recognition  is: 

For  a  family  of  coordinate  transformations  tP,  a  library  L ,  and  a 
picture  P  =  {(fi,  AT), ... ,  (/m,Xm)},  determine  a  disjoint  parti¬ 
tion  of  P  into  objects  from  L ,  i.e.,  subsets  Oi, . . . ,  Oq  such  that: 

(i)  for  i  ^  j  OiD  Oj  —  0;  (ii)  P  =  0\  U  •  •  •  U  Oq;  (iii)  for  1  <  i  <  q 
there  is  i/’;  €  'k  that  transforms  an  object  from  L  into  0{. 

Our  main  result  is  that  the  problem  of  model  based  recognition  under  trans¬ 
lations,  rotations,  and  perspective  projections  is  NP-complete.  The  proofs 
are  based  on  a  reduction  from  exact  cover  by  3  sets  (X3C)  that  is  known  to 
be  NP-complete.  (See  [4]  page  221.) 

X3C:  The  following  exact  cover  by  3-sets  problem  is  NP-complete: 

Instance:  a  set  E  of  m  elements  and  a  collection  C  of  3-element  subsets  of 
E. 

Question:  does  C  contain  an  exact  cover  for  E,  i.e.,  a  subcollection  C"  C  C 
such  that  every  element  of  E  occurs  in  exactly  one  member  of  C"? 

Comment:  X3C  remains  NP-complete  even  if  no  element  occurs  in  more 
than  three  subsets  in  C,  but  is  solvable  in  polynomial  time  if  no  element 
occurs  in  more  than  two  subsets.  A  related  problem,  exact  cover  by  2-sets . 
is  solvable  in  polynomial  time. 
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Pi  <-  m2  +  1  -*  P2  •  •  •  Pi  m2  +  i  -+  i?i+i  •  •  •  pm 


Figure  2:  The  picture  in  the  proof  of  Theorem  1. 


p2  <—  2m2  +  5  — ♦  p4  +—  vn2  +  4  — »  p5 


Figure  3:  A  typical  object  in  the  proof  of  Theorem  1. 

3  The  case  of  translation  and  rotation 

Theorem  1:  Let  L  be  a  library  of  objects  and  let  P  be  a  picture.  The 
decision  problem  of  whether  P  can  be  described  as  a  disjoint  union  of  trans¬ 
lated  and  rotated  objects  from  L  is  NP-complete.  The  problem  remains 
NP-complete  even  if  each  object  is  described  by  3  points. 

Proof:  Membership  in  NP  is  obvious.  To  show  that  the  problem  is  NP- 
complete  we  reduce  X3C  to  it. 

Let  { E ,  C}  be  an  instance  of  the  X3C  problem.  C  is  a  collection  of  3-element 
subsets  of  the  m  elements  ex, . . . ,  em  £  E.  We  begin  by  constructing  a  picture 
P  of  m  points  p\, . . .  ,pm  on  the  x  axis.  The  location  of  pi  is  at  the  origin, 
the  point  p2  is  at  distance  m2  +  1  from  pi,  the  point  p3  is  at  distance  m 2  +  2 
from  p2,  etc.  See  the  illustration  in  Figure  2.  Let  <j)  :  E  — »•  P  denote  the 
mapping  of  elements  in  E  to  points  in  P.  For  1  <  i  <  m  we  have: 

<f>(ei)  =  a  point  at  x  =  (i  —  l)m2  +  i(i  —  l)/2  (1) 

Clearly,  <j>  is  1-1  and  onto,  so  that  the  inverse  mapping  is  well  defined.  We  now 
create  the  library  L  from  the  3-element  subsets  in  C .  For  a  3-set  composed 
of  the  elements  ea,  ep,  e7  we  add  to  L  an  object  described  by  the  3  points 
<f>(ea),  <j>(ep),  <£(e7).  The  object  generated  by  the  elements  e2,e4,es  is  shown 
in  Figure  3. 

To  prove  the  NP-complete  result  it  remains  to  show  that  P  is  a  disjoint 
union  of  rotated  and  translated  objects  from  L  it  and  only  if  C'  contains  an 
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exact  cover  of  E.  The  proof  is  based  on  Lemma  1  which  is  proved  at  the  end 
of  this  section. 

Let  C"  C  C  be  an  exact  cover  of  E ,  where  q  =  m/3  =  |C"|.  For 
{c*i»e<a,eis}  G  C'  define  0{  =  {^(eq ),  (j>(eh),  0(ei3)},  so  that  0,  £  L  for 
1  <  i  <  9-  Since  C"  is  a  cover  of  E  and  <p  is  onto,  P  —  (J?..!  0{.  Since  C"  is 
exact  and  <p  is  1-1,  Oi  fl  Oj  =  0  for  i  ^  j. 

Conversely,  let  be  the  family  of  coordinate  translations  and  rotations, 
and  assume  Oi  €  L,  ipi  €  for  1  <  i  <  q  such  that:  (i)  for  i  /  j  ipi(Oi)  Pi 
ipj(Oj)  —  0;  (ii)  P  =  U-=,  4’iiOi).  From  Lemma  1  it  follows  that  ip,,  is  the 
identity  transformation  ( ipi(0 ,)  =  Oi),  so  that  ipi(Oi)  G  L  for  1  <  i  <  q.  Let 
Oi  =  {Ph,Pi?,Pi3}.  Define  T,  =  (^(pi,  ),(f>~1{pi7),<f>-1(pi3)),  and  C"  =  {T{  : 
1  <  i  <  q}.  From  (ii)  and  the  fact  that  <p~l  is  onto  it  follows  that  C"  is  a 
cover.  From  (i)  and  the  fact  that  </>-1  is  1-1  it  follows  that  C  is  an  exact 
cover.  □ 

Lemma  1:  Let  O  be  an  object  from  the  library  defined  in  the  proof  of 
Theorem  1,  and  let  O'  be  an  object  defined  by  3  points  from  the  picture  in 
the  proof  of  Theorem  1.  If  O  can  be  mapped  by  translation  and  rotation  to 
O'  then  O  =  O'. 

Proof:  Without  loss  of  generality  let  O  be  described  by  the  points  p;,,  pi2 , 
Pi3  and  O'  by  the  points  pj, ,  pj2,  pj3,  where  it  <  i2  <  i3  and  ja  <  j2  <  j3. 
Since  the  objects  are  1-dimensional,  a  transformation  taking  O  to  O'  involves 
either  zero  rotation  or  a  180°  rotation.  We  show  that  the  transformation  must 
be  with  zero  rotation  and  zero  translation. 

First.,  suppose  the  transformation  involves  no  rotation,  then  the  distance 
between  p,,  and  p;2  is  the  same  as  the  distance  between  pj,  and  pj2 .  From 
Equation  (1)  we  have 

(n  -  j,y  +  ;’(ji~1):jlbl-1)  =  (i,  - 

£  L 

Let  s(i,j)  =  (j(j  —  1)  —  i(i  —  l))/2,  so  that  the  above  equation  can  be  written 
as 

\(j2  -ji)  -  (*2  -  ii)]m2  =  ■?(i1.  i2)  -  ■?( Ji ,  J2 )-  (2) 

Clearly,  0  <  s{i,j)  <  m2  for  1  <  i  <  j  <  in.  and  | .5 ( f j ,  / 2 )  —  •s(j1,j2)|  <  w2. 
But  since  the  right  hand  side  of  Equation  (2)  is  divisible  by  m2  it  must  equal 
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0,  and  we  have 


(3) 


S(*l5*2)  =  s{j2,j2) 

J2  -  ji  =  i2-i\ 

The  unique  solution  to  the  system  (3)  with  ji,j2  as  the  unknowns  is  j\  —  i\ 
and  j2  =  i2-  Since  in  pure  translation  the  distance  between  pi3  and  pi3  is 
the  same  as  the  the  distance  between  pj3  and  pj3  the  same  derivation  gives 
jz  —  *3?  so  that  0  —  0'. 

It  remains  to  show  that  a  transformation  taking  O  to  O'  cannot  involve 
rotation.  Suppose,  on  the  contrary,  that  O  is  mapped  to  O'  by  a  transfor¬ 
mation  involving  nonzero  rotation.  As  mentioned  above,  this  rotation  must 
be  180°.  But  then  the  distance  between  pi3  and  pi2  is  the  same  as  the  dis¬ 
tance  between  pj3  and  pj2 ,  and  the  distance  between  pi2  and  pi3  is  the  same 
as  the  distance  between  pj2  and  pjl .  Using  the  same  derivation  as  above  we 
get  j  1  =  is,  j 2  =  i2,  and  js  =  i\.  But  since  j  1  <  js  and  ii  <  i 3  we  have  a 
contradiction.  □ 

4  Translation,  rotation,  and  scaling 

Theorem  2:  Let  L  be  a  library  of  objects  and  let  P  be  a  picture.  The  deci¬ 
sion  problem  of  whether  P  can  be  described  as  a  disjoint  union  of  translated, 
rotated,  and  scaled  objects  from  L  is  NP-complete.  The  problem  remains 
NP-complete  even  if  each  object  is  described  by  6  points. 

Proof:  Membership  in  NP  is  obvious.  To  show  that  the  problem  is  NP- 
complete  we  reduce  X3C  to  it. 

Let  {E,C}  be  an  instance  of  the  X3C  problem.  We  begin  by  constructing  a 
2D  picture  Q  as  a  disjoint  union  of  two  pictures:  Q  =  P  U  P'.  The  pictures 

are  defined  by  the  two  1-1  and  onto  mappings:  <j>  :  E  — >  P  and  9  :  E  P'. 

<f>(ei)  =  a  point  at  *  =  (i  —  l)m2  +  i(i  —  l)/2,  y  =  0  , 

9(ei)  =  a  point  at  x  —  (i  —  l)m2  -f  i(i  —  l)/2,  y  =  d  ^ 

See  the  illustration  in  Figure  4.  We  now  create  the  library  L  from  the  3- 
element.  subsets  of  C.  For  (ea,  e^,  e-, )  we  add  to  L  an  object  described  by 
the  6  points:  9(ea),  0(ep),  #(eT),  0(eQ),  c>( ) ,  0(e^).  The  object  generated 
by  the  elements  e2,e4,e5  is  shown  in  Figure  5. 
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P'l  P2  P'i  P'i+1  Pm 

T 

d 

i 

Pi  <-  m2  +  1  ->  p2  ■■■  Pi  <-  m2  +  i  ->  P,+1  •  ■  •  Pm 


Figure  4:  The  picture  in  the  proof  of  Theorem  2. 
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m  +  4 


Figure  5:  A  typical  object  in  the  proof  of  Theorem  2. 

To  complete  the  proof  it  remains  to  show  that  Q  is  a  disjoint  union  of 
translated,  rotated,  and  scaled  objects  from  L  if  and  only  if  C  contains  an 
exact  cover  of  E.  The  proof  is  based  on  Lemma  2  which  will  be  proved  at 
the  end  of  this  section. 

Let  C'  C  G  be  an  exact  cover  of  E ,  with  q  =  |C"|.  For  {e^, ,  e;2,  e,3}  €  C' 
define  0t  =  {^(ei1),^(ei2),^(eij),^(eil),^(ei2),0(eij)},  so  that  Oi  6  L  for 
1  <  i  <  <?•  Since  C"  is  a  cover  of  E,  and  <f>,  9  are  onto  P  and  P'  respectively, 
Q  =  P  U  P'  =  U?=i  0{.  Since  C'  is  exact  and  <f>,  9  are  1-1,  Oi  fl  Oj  =  0  for 
*  ±  3- 

Conversely,  let  be  the  family  of  coordinate  translations  rotations  and 
scaling  and  assume  Oi  €  L ,  if’i  6  $  for  1  <  i  <  q  such  that:  (i)  for  i  ^  j, 
4>i(Oi)  fl  ijjj(Oj)  —  0;  (ii)  Q  =  ULa  i’i(Oi )•  From  Lemma  2  it  follows  that  tpi 
is  the  identity  transformation,  so  that  i\(Oi)  €  L  for  1  <  i  <  q.  Let  0;  = 
{Ph  »  Phi  Phi  Pix  iP'i2->P'i3}i  where  we  assume  without  loss  of  generality  that 
Pi!,Pi2,Pi3  have  zero  y  coordinates.  Define  T,  =  {d~1(pil ),  d>_1(  p,3 ),  ^-1(p;3  )}> 
and  6"  =  { Ti  :  1  <  i  <  q }.  From  (ii)  and  the  fact  that  <j>~1  is  onto  E  it  follows 
that  C'  is  a  cover.  From  (i)  and  the  fact  that  is  1-1  it  follows  that  O'  is 
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an  exact  cover.  □ 


Lemma  2:  Let  O  be  an  object  from  the  library  defined  in  the  proof  of 
Theorem  2,  and  let  O'  be  an  object  defined  by  6  points  from  the  picture  in 
the  proof  of  Theorem  2.  If  0  can  be  mapped  by  translation,  rotation,  and 
scaling  to  O'  then  0  =  0'. 

Proof:  Let  0  be  generated  by  e< , ,  e,;2 ,  .  Let  U\,U2,U3,  be  the  points  of 

O'  that  are  mapped  to  ^(e,-,), 0(ej2), 0(e;3)  respectively,  then  vi,u2,U3  are 
collinear.  Similarly,  let  vi,v2,V3,  be  the  points  of  O'  that  are  mapped  to 
(f}(ei3)  respectively,  then  v\,v2,V3  are  collinear.  Since  #(etl), 
0(ei2),  <f>(eit)  form  a  right  triangle,  Ui,u2,  vi  form  a  right  triangle,  so  that 
the  triplets  ui,u2,u3,  and  v i,v2,v3  are  not  on  the  same  line  in  the  picture. 
Therefore,  it  must  be  that  one  triplet  lies  on  the  line  y  =  0,  and  the  other  on 
the  line  y  =  d,  and  since  the  distance  between  the  lines  in  the  library  object 
is  d,  the  transformation  involves  no  scaling. 

It  remains  to  show  that  the  transformation  involves  no  translation  and 
rotation  and  this  follows  from  Lemma  1  when  applied  to  the  points  ui,u2,U3 
and  the  library  of  objects  defined  by  the  triplets  of  points  {^(e,,),  0(e;3), 
for  1  <  i  <  q.  □ 

5  The  case  of  perspective  projection 

A  perspective  projection  is  the  mapping  7 r  :  TV  — ►  TV  given  by 

X=f^;y=f^  ^ 

Here  it  is  assumed  that  the  camera  is  at  the  origin  and  pointed  directly  down 
the  Z  axis.  The  reference  frame  is  oriented  as  the  image  plane,  which  is  lo¬ 
cated  at  distance  /  from  the  origin.  See  [7].  Unlike  translation,  rotation,  and 
scaling,  perspective  projection  may  destroy  geometric  properties  by  merging 
lines  and  points.  In  the  extreme  case,  any  object  far  enough  from  the  im¬ 
age  plane  is  projected  into  a  single  point  in  a  finite  resolution  picture.  To 
eliminate  degenerate  cases  we  consider  only  stable  perspective  projections. 
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Definition:  A  stable  perspective  projection  has  the  following  properties:  (i) 
Distinct  3D  feature  points  are  mapped  into  distinct  2D  feature  points,  (ii) 
Non-collinear  3D  feature  points  are  mapped  into  non-collinear  2D  feature 
points. 

Notice  that  a  small  perturbation  of  the  viewing  point  of  an  unstable  perspec¬ 
tive  projection  always  gives  a  stable  perspective  projection. 

Theorem  3:  Let  £  be  a  library  of  3D  objects,  and  let  P  be  a  2D  picture 
given  as  a  set  of  local  features  and  their  2D  location.  The  decision  problem 
of  whether  P  can  be  described  as  a  stable  perspective  projection  of  a  disjoint 
union  of  translated  and  rotated  objects  from  L  is  NP-complete.  The  problem 
remains  NP-complete  even  if  each  object  is  described  by  12  points. 

Proof:  Membership  in  NP  is  obvious.  To  show  that  the  problem  is  NP- 
complete  we  reduce  X3C  to  it. 

Let  {E,  C}  be  an  instance  of  the  X3C  problem.  We  begin  by  constructing 
the  2D  picture  Q  =  P\  U  P2  U  P3  U  P4,  where 


Pj  =  {^(e<);  1  <  i  <  m}  for  1  <  j  <  4 


Mei)  = 

J  J  \  l  / '  —  ~  —  J  -  —  ^  — 

a  point  at  *  =  ( i  —  l)m2  +  i{i  —  l)/2, 

y  =  0 

Mei )  = 

a  point  at  x  =  (i  —  l)m2  +  i(i  —  l)/2, 

y  =  m3 

Mei)  = 

a  point  at  y  =  (i  —  l)m2  +  i(i  —  l)/2, 

x  =  —  1 

<j>4(ei)  = 

a  point  at  y  =  (i  —  l)m2  +  i(i  —  l)/2, 

X  =  Til3  +  1 

Thus,  the  points 

are  on  the  edges  of  a  planar  rectangle. 

We  now  create  the  library  L  from  the  3-element  subsets  of  C.  For  (e;l5  etJ, 
e;,)  we  add  to  L  an  object  described  by  the  twelve  3D  points:  1  < 

j  <  4,  1  <  t  <  3},  where: 

<j>\{ei)  =  a  point  at  X  =  (i  —  l)ra2  +  i(i  —  l)/2,  Y  —  0,  Z  —  f 

=  a  point  at  X  —  (i  —  l)m2  4-  i(i  —  l)/2,  Y  =  m3,  Z  =  / 

(e;)  =  a  point  at  Y  =  (i  —  l)m2  -I-  i(i  —  l)/2,  X  =  —1,  Z  =  / 

^(ej)  =  a  point  at  Y  =  (i  —  l)m2  -f  i(i  —  l)/2,  X  —  m3  +  1,  Z  —  f 

Observe  that  7r(<^(ei))  =  <f>j(e{)  for  1  <  j  <  4.  It  remains  to  show  that  Q  is 
a  stable  perspective  projection  of  a  disjoint  union  of  translated  and  rotated 
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objects  from  L  if  and  only  if  C  contains  an  exact  cover  of  E.  The  proof  is 
based  on  Lemma  3  which  will  be  proved  at  the  end  of  this  section. 

Let  C'  C  C  be  an  exact  cover  of  E ,  with  q  =  \C'\.  For  {e*, , e;3,  e»3}  €  C' 
define  Oi  as  the  3D  object  described  by  the  twelve  3D  points:  <,);!  < 

j  <  4,  1  <  t  <  3},  so  that  Oi  G  L  for  1  <  i  <  q.  Since  C'  is  a  cover  of  E , 
and  d>i  are  onto  Pj  respectively,  Q  =  U?_i  tt  ( 0*).  Since  C'  is  exact  and  d>,- 
are  1-1,  ir(Oi)  D  ir(0j)  =  0  for  i  ?  j. 

Conversely,  let  $  be  the  family  of  coordinate  translations  and  rotations 
and  assume  Oi  G  L  and  fa  G  $  for  1  <  i  <  q,  such  that:  (i)  for  i  ^  j, 
n(fa(0i))  n  7 r{i>j{0j))  =  0;  (ii)  Q  =  ULi  7r(^i(0*))-  From  (ii)  and  Lemma  3 
it  follows  that  rj>i  is  the  identity  transformation,  so  that  ifii(Oi)  G  L  for 
1  <  *  <  q-  Let  Oi  =  ,  j  ,  pi, }  for  1  <  j  <  4,  where  we  assume  without 

loss  of  generality  that  pjt  were  generated  by  Define  Ti  =  {(^)_1(p^t )  : 
1  5:  J  ^  4, 1  <  t  <  3},  and  C'  =  {Ti  :  1  <  i  <  g}.  From  (ii)  and  the  fact 
that  is  onto  E  it  follows  that  C'  is  a  cover.  From  (i)  and  the  fact  that 

is  1-1  it  follows  that  C'  is  an  exact  cover.  □ 

Lemma  3:  Let  0  be  a  3D  object  from  the  library  defined  in  the  proof  of 
Theorem  3,  and  let  O'  be  an  object  defined  by  12  points  from  the  picture 
in  the  proof  of  Theorem  3.  If  O  can  be  mapped  by  translation  rotation  and 
stable  perspective  projection  to  O'  then  the  mapping  is  with  zero  translation 
and  rotation. 

Proof:  We  use  the  following  properties  of  perspective  projection  (see  [7], 
Chapter  13):  (a)  Collinear  3D  points  are  projected  into  collinear  2D  points, 
(b)  If  the  projection  of  parallel  3D  lines  is  parallel  2D  lines  then  the  3D  lines 
are  parallel  to  the  image  plane. 

Let  0  be  generated  by  e;, ,  e,-, ,  e;3 .  Let  Lj  be  the  3D  line  of  the  rotated 
and  translated  points  <f)j( e i,),  <^(ef3),  <f>j(ei3 )  for  1  <  j  <  4,  so  that  L\  is 
parallel  to  and  L3  is  parallel  to  L4.  Let  uj, Uj,Uj  be  the  points  of  O' 
that  are  mapped  to  <f)j( e ;,),  <^j(e,2),  ^|(ej3)  respectively,  then  are 

collinear  for  1  <  j  <  4,  and  since  the  projection  is  stable,  the  4  triplets  are 
on  4  different  lines  in  the  picture.  The  picture  has  exactly  four  lines  with  at 
least  3  points.  These  lines  are:  y  =  0,  y  =  m3,  x  —  —1,  and  x  =  m3  +  1. 
Therefore,  the  4  triplets  come  from  these  4  lines. 
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Let  lj  be  the  projection  of  Lj  for  1  <  j  <  4.  lx  intersects  with  two  lines 
from  {^2?  4>  and  is  parallel  to  the  third.  Since  L\  intersects  with  £3  and 
£4,  l\  intersects  with  l3,  l4,  and  is  parallel  to  l2.  Thus,  we  have  two  parallel 
lines  £i,£2  that  are  projected  into  parallel  lines.  Therefore,  both  Lx  and  L2 
must  be  parallel  to  the  image  plane;  let  Zx  and  Z2  be  their  depth.  From  the 
same  arguments  the  lines  £3,  £4  are  parallel  to  the  image  plane;  let  Z2,  Z4 
be  their  depth  respectively.  But  since  £3  intersects  with  both  £1  and  L2  we 
have  Z\  —  Z2  —  Z3  —  Z4. 

We  conclude  that  all  the  points  of  the  translated  and  rotated  object  0 
have  the  same  distance  from  the  image  plane.  From  Equation  (5)  it  follows 
that  in  this  case  the  distance  from  the  image  plane  has  the  effect  of  scaling  the 
object.  Thus,  Lemma  3  follows  from  Lemma  2  when  applied  to  the  library 
of  objects  defined  by  <j>j( e *,),  <j>j(e{7),  <^j(e,-s)  and  the  6  points  for 

1  <  j  <  2.  □ 


6  Implications 

In  this  section  we  identify  constraints  that  can  potentially  simplify  model 
based  recognition,  and  other  constraints  that  leave  the  problem  NP-complete. 

Local  features  other  than  a  point:  With  no  additional  structure  this  can 
only  make  the  problem  more  difficult.  However,  with  additional  structure  of 
the  local  features  the  problem  may  become  polynomial.  For  example,  straight 
lines  may  have  an  additional  constraint  that  their  ends  meet  (see  Figure  1). 

Occlusion:  Without  additional  structure  this  can  only  make  the  problem 
more  difficult.  However,  with  additional  constraints  such  as  convexity  this 
makes  our  NP-completeness  proofs  inapplicable,  so  that  it  may  potentially 
simplify  the  problem. 

A  small  number  of  feature  points:  If  each  object  is  described  by  2  points 
the  problem  is  polynomially  solvable  by  matching  techniques. 

A  large  number  of  feature  points:  Without  additional  structure  this 
can  only  make  the  problem  more  difficult.  However,  if  it  is  assumed  that 
small  subsets  of  these  points  determine  a  unique  object  from  the  library  then 
the  problem  is  polynomially  solvable.  (This  is  the  essential  assumption  in 
geometric  hashing  [9]). 
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Almost  distinct  subsets:  If  the  distance  between  every  pair  of  feature 
points  uniquely  determines  two  (or  less)  objects,  the  problem  is  polynomially 
solvable.  If  this  distance  determines  three  (or  more)  objects  the  problem  is 
still  NP-complete.  This  follows  from  the  comment  in  the  definition  of  X3C. 

Dimensionality:  Notice  that  the  results  of  Theorem  1  hold  also  for  transla¬ 
tion  and  rotation  in  2  and  3  dimensions.  Similarly,  the  results  of  Theorem  2 
hold  also  for  3  dimensions. 


7  Concluding  remarks 

We  have  shown  that  the  problem  of  model  based  recognition  is  NP-complete. 
Thus,  there  is  little  hope  for  a  performance  guaranteed  algorithm  that  can 
solve  the  problem  efficiently.  However,  it  is  still  possible  that  easy  sub-classes 
of  the  problem  can  be  characterized  by  additional  structure  of  the  modeled 
objects  (e.g.,  convexity)  and  the  way  they  are  viewed  (e.g.,  occlusion).  Our 
results  can  help  determine  what  constraints  are  potentially  useful. 
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